Optimal Data-Based Binning for Histograms

نویسنده

  • Kevin H. Knuth
چکیده

Histograms are convenient non-parametric density estimators, which continue to be used ubiquitously. Summary quantities estimated from histogram-based probability density models depend on the choice of the number of bins. In this paper we introduce a straightforward data-based method of determining the optimal number of bins in a uniform bin-width histogram. Using the Bayesian framework, we derive the posterior probability for the number of bins in the density model given the data. The most probable solution is determined naturally by a balance between the likelihood function, which increases with increasing number of bins, and the prior probability of the model, which decreases with increasing number of bins. We demonstrate how these results outperform several well-accepted rules for choosing bin sizes even in the integrated square error sense. Last, we demonstrate that these results can be applied directly to multi-dimensional histograms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Binning and Dissimilarity Measure for Image Retrieval and Classification

Color histogram is an important part of content-based image retrieval systems. It is a common understanding that histograms that adapt to images can represent their color distributions more efficiently than histograms with fixed binnings. However, among existing dissimilarity measures, only the Earth Mover’s Distance can compare histograms with different binnings. This paper presents a detailed...

متن کامل

Experiments in Binning Image Statistics

Various vision tasks require the computation of image statistics and aggregating them into histograms. These histograms are usually compared using the χ distance which gives a rough idea of the similarity of the two image patches. For example, in [2] a histogram of key-points is collected on a rectangular grid for the category recognition task. Other researchers, [3], have used local image stat...

متن کامل

Adaptive histograms and dissimilarity measure for texture retrieval and classification

Histogram-based dissimilarity measures are extensively used for content-based image retrieval. In an earlier paper [1], we proposed an efficient weighted correlation dissimilarity measure for adaptive-binning color histograms. Compared to existing fixed-binning histograms and dissimilarity measures, adaptive histograms together with weighted correlation produce the best overall performance in t...

متن کامل

The analysis and applications of adaptive-binning color histograms

Histograms are commonly used in content-based image retrieval systems to represent the distributions of colors in images. It is a common understanding that histograms that adapt to images can represent their color distributions more efficiently than do histograms with fixed binnings. However, existing systems almost exclusively adopt fixed-binning histograms because, among existing well-known d...

متن کامل

Markov Chain Driven Multi-Dimensional Visual Pattern Analysis with Parallel Coordinates

Parallel coordinates is a widely used visualization technique for presenting, analyzing and exploring multidimensional data. However, like many other visualizations, it can suffer from an overplotting problem when rendering large data sets. Until now, quite a few methods are proposed to discover and illustrate the major data trends in cluttered parallel coordinates. Among them, frequency-based ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006